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Abstract 

We investigate the problem of succinctly representing an arbitrary permutation, 
7r, on {0, . . . ,n — 1} so that w k (i) can be computed quickly for any % and any 
(positive or negative) integer power k. A representation taking (1 + e)nlgn + 
O(l) bits suffices to compute arbitrary powers in constant time, for any positive 
constant e < 1. A representation taking the optimal |~lgn!] + o(n) bits can be 
used to compute arbitrary powers in 0(lgn/lglgn) time. 

We then consider the more general problem of succinctly representing an 
arbitrary function, / : [n] — > [n] so that f k {i) can be computed quickly for any i 
and any integer power k. We give a representation that takes (l + e)nlgri + 0(l) 
bits, for any positive constant e < 1, and computes arbitrary positive powers in 
constant time. It can also be used to compute f k (i), for any negative integer k, 
in optimal 0(1 + \f k (i)\) time. 

We place emphasis on the redundancy, or the space beyond the information- 
theoretic lower bound that the data structure uses in order to support operations 
efficiently. A number of lower bounds have recently been shown on the redun- 
dancy of data structures. These lower bounds confirm the space-time optimality 
of some of our solutions. Furthermore, the redundancy of one of our structures 
"surpasses" a recent lower bound by Golynski [Golynski, SODA 2009], thus 
demonstrating the limitations of this lower bound. 

Keywords: Succinct data structures, Space redundancy, Permutations, 
Functions, Bencs network, Succinct tree representations, Level ancestor queries 



*Work supported in part by UISTRF project 2001.04/IT. 
Preliminary versions of these results have appeared in the Proceedings of International 
Colloquium on Automata, Languages and Programming (ICALP) in 2003 and 2004. 
'Corresponding author 
Email addresses: imunroOuwaterloo . ca (J. Ian Munro), rr29Sleicester.ac.uk (Rajeev 
Raman), vramanSimsc .res . in (Venkatesh Raman), ssrao@cse.snu.ac.kr (S. Srinivasa Rao) 



Preprint submitted to Elsevier 



August 10, 2011 



1. Introduction 

For an arbitrary function / from [n] — {0, . . . , n— 1} to [n], define f k (i), for 
all i G [n], and any integer fc as follows: 

!i when fc = 

/(/ fc-1 (i)) when fc > and 

{j|/" fc (j)=«} whenfc<0. 

We consider the following problem: we are given a specific and arbitrary (static) 
function / from [n] to [n] that arises in some application. We want to represent 
/ (after pre-processing /) in a data structure that, given fc and i as parameters, 
rapidly returns the value of f k (i). For the sake of simplicity, in the rest of the 
paper we assume that the given number fc is bounded by some polynomial in n. 

Our interest is in succinct, or highly-space efficient, representations of such 
functions, whose space usage is close to the information-theoretic lower bound 
for representing such a function. Since there are n™ functions from [n] to [n], 
such a function cannot be represented in less than \n lg n] bitfl Any amount 
of memory used by a data structure that represents such a function, above and 
beyond this lower bound, is termed the redundancy of the data structure. We 
also consider the case where / is given as a "black box" , i.e. the data structure 
is given access to a routine to evaluate f(i) for any i £ [n]; in this case any 
amount of memory whatsoever used by the data structure is its redundancy. The 
fundamental aim is to understand precisely the minimum redundancy required 
to support operations rapidly. 

Clearly, the above problem is trivial if space is not an issue. To facilitate the 
computation in constant time, one could store f k (i) for all % and fc (|fc| < n, along 
with some extra information), but that would require fl(n 2 ) words of memory. 
The most natural compromise is to retain the values of f k (i) where 2 < fc < n 
is a power of 2. This 0(nlgn)-word representation easily yields a logarithmic 
evaluation scheme. Unfortunately, this representation not only uses non-linear 
space (and is relatively slow) but also does not support queries for the negative 
powers of / efficiently. Given / in a natural representation — the sequence f(i) 
for i = 0, . . . , n — 1, or as a black box — a highly space-efficient solution is to 
store no additional data structures (zero redundancy), and to compute f k {i) in 
fc steps, for positive fc. However, this is unacceptably slow for large fc, and still 
does not address the issue of negative powers. 

1.1. Results 

Our results are primarily in the unit-cost RAM with word size O(logrt) bits, 
where we measure the running time and the bits of space used by an algorithm. 
We also consider the "black-box" model, known also as the systematic model 
(loj . where we look at the number of evaluations of / in addition to the run- 
ning time and space (in bits) used by the algorithm. Lower bound results are 



Mg denotes the logarithm base 2. 
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discussed in either the black-box model or in the cell-probe model, where we 
consider the space (in bits) used by the algorithm, and the running time is the 
number of u>-bit words of the data structure read by the algorithm to answer a 
query (and all other computation is for free). Finally, we also briefly consider 



the bit-probe model, which is the cell-probe model with w = 1 24 



1.1.1. Permutations 

We begin by considering a special case, where the function is a permutation 
(abbreviated hereafter as a perm 22]) of [n] = {0, . . . , n— 1}. This turns out not 
only to be an interesting sub-case in its own right, but is also essential to our 
solution to the general problem. Note that for storing perms, the information- 
theoretic lower bound is V(n) = [~lgn!] nlgn — 1.44n bits, so the obvious 
representation (as an array storing ir(i) for i — 1, . . . , n) has redundancy Q(n) 
bits (and of course does not support inverses or powers). We obtain the following 
results for representing perms: 

1. We give a representation that uses V(n) + 0(n(\g lg n) 5 /(lgn) 2 ) bits, and 
supports 7t() and 7r _1 () in 0(lgn/lglgrc.) time. 

2. In the "black box" model, where access to the perm is only through the 7r() 
operation, we show how to support 7r~ 1 () in 0{t) time and at most t + 1 
evaluations of 7r(), using (n/t)(lgn + lgi + 0(1)) bits, for any 1 < t < n. 

3. Given a structure that represents a perm 7r in space S(n) bits, and sup- 
ports 7r() and 7r -1 () in time tf(n) and tj(ra) respectively, we show how 
to represent a given perm n' on [n] in space S(n) + 0(n lgn/lglgn) 
bits (or S(n) + 0(y/nlgn) bits) and support arbitrary powers of tt' in 
tf(n) +U(n) + 0(1) time (or i/(n) +ij(n) +0(lglgn) time, respectively). 

As corollaries, we get the following representations of perms: 

4. one that uses V(n) + 0((n/t)\gn) bits, and supports irQ in 0(1) time and 
7r~ 1 () in 0(t) time, for any t <\gn. 

5. one that uses V(n) + 0((n/t) lg n) bits and supports 7r fc () in 0(t) time for 
arbitrary fc, for any t < lg n. 

6. one that uses 'P(n) + 0(n(lglgn) 5 /(lgn) 2 ) bits and supports 7r fc () in 
0(lgn/lglgn) time for arbitrary k. 

Related Work 

Perms are fundamental in computer science and have been the focus of ex- 
tensive study. A number of papers have dealt with issues pertaining to perm 
generation, membership in perm groups etc. There has also been work on 
space-efficient representation of restricted classes of perms, such as the perms 
representing the lexicographic order of the suffixes of a string [l?], EH , or so- 
called approximately min-wise independent perms, used for document similarity 
estimation Q . Our paper is the first to study the space-efficient representation 
of general perms so that general powers can be computed efficiently (however, 
see the discussion on Hellman's work in Section rO|) . 



3 



Recently Golynski 14, [lj| showed a number of lower bounds for the re- 
dundancy of permutation representations. He showed a space lower bound of 
fl((n/t) \g(n/t)) bits for Item (2) for any algorithm that evaluates tt at most 
t < n/2 times [lU Theorem 17]. Thus, (2) is asymptotically optimal for all 
t = n 1 ^ n ^ 1 \ Furthermore, Golynski [l4| showed that the redundancy of (4) is 
asympotically optimal in the cell probe model with word size w = lg n: specifi- 
cally, that any perm representation which supports 7r() in O(l) probes and vr _1 () 
in t probes, for any t < (l/16)(lgn/ lglgn), must have asymptotically the same 
redundancy as (4). He also shows that any perm that supports both 7r() and 
7r _1 () in at most t cell probes, for any t < (1/16) (lgn/ lglgn), must have redun- 
dancy 51(n(lg lg n) 2 / lgn). In the preliminary version of this paper [26j |. a perm 
representation was given that supported 7r() and 7r _1 () in 0(lgn/lglgn) time, 
and had redundancy 9(n(lglgn) 2 /lgn). Golynski suggested that the result of 
[2H ] was "optimal up to constant factor in the cell probe model". However, 
we note that the lower bound is quite sensitive to the precise constant in the 
number of probes: our result (1) obtains an asymptotically smaller redundancy 
by using over 21gn/lglgn cell probes. 

1.1.2. Functions 

For general functions from [n] to [n], our main result is that we reduce the 
problem of representing functions to that of representing permutations, with 
0(n) additional bits. As corollaries, we get the following representations of 
functions, both of which use close to the information-theoretic minimum amount 
of space, and answer queries in optimal time: 

1. one that uses nlgn(l + 1/t) + 0(1) bits, and supports f k (i) in 0(1 + 
\f k (i)\ ■ t) time for any integer k, and for any t < lgn/lglgn. 

2. one that uses nlgn + 0(n) bits and supports f k (i) in 0((1 + |/ ■ 
(lgn/lglgn)) time, for any integer k. 

Along the way, we show that an unlabelled static n-node rooted tree can be 
represented using the optimal 2n + o{n) bits of space to answer lev el- ancestor 
— given a node x and a number k, to report the i-ih ancestor of a; — and level- 
successor /lev el- predecessor queries — to report the next /previous node at the 
same level as the given node — in constant time. We represent the tree in 2n 
bits as a balanced parenthesis (BP) sequence. The key technical contribution is 
to provide a o(n)-bit index for excess search in a BP sequence. For a position 
i in a BP sequence, excess(i) is the number of unclosed open parentheses up to 
that position (this corresponds to the depth of a node in the tree represented 
by the BP). The operation next-excess(i, k), starting at a position i in the BP 
sequence, finds the next position j whose excess is k; we support next-excess in 
O(l) time provided that j's excess is at most (lgn) c below or above the excess 
of i (i.e., \k — excess(i)| = 0((\gn) c )), for any fixed constant c > 0. To add 
standard navigational operations, one can use existing o(n) bit indices for BP 
sequences j2Sj. 
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Related work 

The problem of representing a function / space-efficiently in the "black box" 
model, so that f" 1 can be computed quickly, was considered by Hellman [20j |. 
Specialized to perms, Hcllman's idea is similar to our "black box" representation 
for representing a perm and its inverse, modulo some implementation details. 
The version of the function powers problem that we consider is different: whereas 
Hellman attempts, given x, to find any y such that f(y) = x, we enumerate all 
such y. Furthermore, our solution does not use the "black box" model, and 
assumes space for representing / in its entirety, which is both unnecessary and 
prohibitive in Hellman's context. 

Representing trees to support level-ancestor queries is a well-studied prob- 
lem. Solutions with 0(n) preprocessing time and 0(1) query time were given 
by Dietz 0], Berkman and Vishkin Q and by Alstrup and Holm [![. A much 
simpler solution was given by Bender and Farach-Colton [3J. For a tree on n 
nodes, all these solutions require 0(n) words, or O(nlgn) bits, to represent the 
tree itself, and the additional data structures stored to support level-ancestor 
queries also take 0(n) words (level-successor/predecessor is trivial using Q(n) 
words). 

As noted above, our interest is in succinct tree representations. We make a 
few remarks about such representations, so as to better understand our contri- 
bution in relation to others. Succinct tree representations can also be considered 
to be split into a tree encoding that takes 2n + o(n) bits, and an index of o(n ) 
bits for that tree encoding. There are many tree encodings, including BP [251 ] . 



DFUDS j4], LOUDS [2l| and Partition [12j, and it is not known if they are 
equivalent, i.e. if there are operations that have o(n) sized indices for one tree 
encoding and not the other. Another feature is that different tree encodings 
impose different numberings on the nodes of the tree. Therefore, a result show- 
ing a succinct index for a particular operation in (say) BP does not imply the 
existence of a succinct index for that operation in (say) LOUDS. This matters 
from an application perspective because the only way to get a space-efficient 
data structure that simultaneously supports operations a and b, where a and b 
are known to be supported only by (say) LOUDS and BP-based tree encodings 
respectively, would be to encode the tree twice, once each in LOUDS and BP 
and to maintain the correspondence between the LOUDS and BP numberings, 
which would severely affect the space usage. 

We provide o(n)-bit BP indices for the operations of level-ancestor and level- 



successor /predecessor, via excess search. Geary et al. [12| gave a o(n)-bit index 
for supporting level-ancestor in 0(1) time using the Partition encoding, but 
they did not provide support for level-successor /predecessor; a o(n)-bit index 
for supporting these queries was announced by He et al. [19(. Very recently 
Sadakane and Navarro 33] gave an alternative algorithm for excess search in 
BP and showed that excess search together with range-minimum queries suffice 
to support a wide variety of tree operations, among other things. Their excess 
index is of smaller size, but seems not to support search for excess values greater 
than the starting point. 
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1.2. Motivation 

There are a number of motivations for succinct data structures in g eneral, 
many to do with text indexing or representing huge graphs 171.121. 

MMi- work 

on succinct representation of a perm and its inverse was, for one of the authors, 
originally motivated by a data warehousing application. Under the indexing 
scheme in the system, the perm corresponding to the rows of a relation sorted 
under any given key was explicitly stored. It was realized that to perform certain 
joins, the inverse of a segment of this perm was precisely what was required. 
The perms in question occupied a substantial portion of the several hundred 
gigabytes in the indexing structure and doubling this space requirement (for the 
perm inverses) for the sole purpose of improving the time to compute certain 
joins was inappropriate. 

Since the publication of the preliminary versions of these papers, the re- 
sults herein have found numerous applications, most notably to the problem 
of supporting rank and select operations over strings of large alphabets [l6j |. 
Other applications arise in Bioinformatics Q. The more general problem of 
quickly computing 7r fc () also has number of applications. An interesting one is 
determining the r th root of a perm [30( • Our techniques not only solve the r th 
power problem immediately, but can also be used to find the r th root, if one 
exists. Inverting a "one-way" function, particularly in the scenario considered 
by Hellman [2Cj . is a fundamental task in cryptography. 

Finally, very recently a number of results have been shown that focus on the 
redundancy of succinct data structures for various objects, including 1^, 13, 14. 
[2^ | ; we have already mentioned lower bounds on the redundancy of representing 
perms in particular. This has been accompanied by some remarkable results on 
very low-redundancy data structures. For example, consider the simple task of 
representing a sequence of n integers from [r] , for some r > 1 to permit random 
access to the i-th integer. The naive bound of n [~lgr] bits has redundancy 
O(n) bits relative to the optimal [nlgr] bits. Following the first non-trivial 
result on this topic ([26|, Theorem 3]), a line of work culminated in Dodis et al.'s 
remarkable result that 0(l)-time access can be obtained with effectively zero 
redundancy Q . We also note that the redundancy is often important in practice, 
as the "lower-order" redunancy term in the space usage is often significant for 
practical input sizes [ll|. 

The remainder of the paper is organized as follows. The next section de- 
scribes some previous results on indexable dictionaries used in later sections. 
Section [3] deals with permutation representations. In Section 13.11 we describe 
the 'shortcut' method, and Section l3~2"1 describes an optimal space representation 
based on Bcncs networks. Both of these are representations supporting 7r() and 
7r _1 () queries, and we consider the optimality of these solutions in Section |3~51 
In Section 13.41 we consider representations that support arbitrary powers. Sec- 
tions [4] and [5] deal with general function representation. Section [4] outlines new 
operations on balanced parenthesis sequences which lead to an optimal-space 
tree representation that supports level-ancestor queries along with various other 
navigational operations in constant time. Section [5] describes a succinct repre- 
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sentation of a function that supports computing arbitrary powers in optimal 
time. 

2. Preliminaries 

Given a set S C [m], \S\ = n, define the following operations: 

rank(x, S): Given x £ [m], return |{y € 5|y < x}|, 

select(i, S): Given i £ [n], return the i + 1-st smallest element in S 1 , 

p-rank(x, S): Given x £ [to], return —1 if x £ S and rank(x, S) otherwise (the 
partial rank operation). 

Furthermore, define the following data structures: 

• A fully indexable dictionary (FID) representation for S supports ran k(x, S 1 ), 
select(i, S), rank(x, S) and select(i, S) in O(l) time. 

• An indexable dictionary (ID) S supports p-rank(x, S) and select(i,5 l ) in 
0(1) time. 

Raman, Raman and Rao (32j show the following: 

Theorem 2.1. On the RAM model with wordsize 0(\gm) bits: 

(a) There is a FID for a set S C [m] of size n using at most [lg ( m )l + 
0(m lg lgm/ lgm) bits. 

(b) There is an ID for a set S C [to] o/ size n using at most [lg ( m jl + o(n) + 
O(lglgm) bits. 

3. Representing Permutations 

3.1. The Shortcut Method 

We first provide a space-efficient representation (based on Hellman's idea) 
that supports 7r _1 () in the "black box" model. Recall that in the "black box" 
model, the perm is accessible only through calls of tt(). Let t > 2 be a parameter. 
We trace the cycle structure of the perm ir, and for every cycle whose length k is 
greater than t, the key idea is to associate with some selected elements, a shortcut 
pointer to an element t positions prior to it. Specifically, let cq, c%, . . . , c&_i 
be the elements of a cycle of the perm 7r such that 7r(cj) = C( i+1 ) mo dfci f° r 
i = 0, 1, . . . , k — 1. We associate shortcut pointers with the indices whose tt 
values are cu, for i = 0, 1, . . . , I = [k/t\ , and the shortcut pointer value at cu 
stores the index whose it value is C((j_i) mo( j r;+i))t, for i = 0, 1, . . . , / (see Fig.[T]). 
Let s < n/f be the number of shortcut pointers after doing this for every cycle 
of the perm and let dx < d% < . . . < d s be the elements associated with shortcut 
pointers. 
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Figure 1: Shortcut method. Solid lines denote the perm, and the dotted lines denote the 
shortcut pointers. The shaded nodes indicate the positions having shortcut pointers. 

We store the set {di} in a data structure D that is an instance of the in- 
dexable dictionary (ID) of Theorem |2.1f b). Given an index i, D allows us to 
test if a particular element has a shortcut pointer with it, and if so, returns its 
position in the set {di}. We store the sequence {s^, where Sj is the shortcut 
pointer associated with di in an array S. The following procedure computes 
n^ 1 (x) for a given x: 

i := x; 

while 7r(i) 7^ x do 

if i £ D and p-rank(i, D) — r // both found by querying D 
then j := S[r]; 
else j := Tr(i); 

i ■= j; 
endwhile 
return i 

Since we have a shortcut pointer for every t elements of a cycle, the number of 
7r() evaluations made by the algorithm is at most t + 1, and all other operations 
take O(l) time by Theorem 12.11 By the standard approximation |~lg ( n )] = 
s(lg(n/s) + O(l)), we see that the space used by D is at most (n/t)(lgt + 0(l)) 
bits. The space used by S is clearly s|~lgn] — s(lgn + 0(1)). Thus we have: 

Theorem 3.1. Given an arbitrary permutation tt on [n] as a "black box", and 
an integer 1 < t < n, there is a data structure that uses at most (n/t)(\gn + 
Igt + 0(1)) bits that allows 7r _1 () to be computed in at most t + 1 evaluations 
ofn(), plus 0(t) time. 

We get the following easy corollary: 

Corollary 3.1. There is a representation of an arbitrary perm it on [n] using 
at most V(n) + 0((n/t)\gn)) for any 1 < t < lgn that supports 7r() in 0(1) 
time and tt^ 1 in 0(t) time. 

Proof. We represent 7r naively as an array taking n|"lgn] = V(n) + 0(n) bits, 
and allowing 7r() to be computed in 0(1) time, and apply Theorem 13.11 The 
space bound follows since for t < lgn, (n/t)(lgn + \gt + 0(1)) = □ 
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Remark: Choosing t = f (1/e)] for any constant e > in Corollary 13. II we get a 
representation of a permutation tt on [n] in (1 + e)n lg n bits where 7r() and tt 
both take 0{1) time. 

3.2. Representations based on the Benes network 
3.2.1. The Benes Network 

The results in this section are based on the Benes network, a communication 
network composed of a number of switches, which we now briefly outline (see 



23j for details). Each switch has two inputs Xq and x\ and two outputs yo and 
yi and can be configured either so that xq is connected to yo (i.e. a packet that 
is input along xq comes out of yo) and x% is connected to yi, or the other way 
around. An r-Benes network has 2 r inputs and 2 r outputs, and is defined as 
follows. For r = 1, the Benes network is a single switch with two inputs and 
two outputs. An (r + 1)-Benes network is composed of 2 r+1 switches and two 
r-Benes networks, connected as shown in Fig. (21(a). A particular setting of the 
switches of a Benes network realises a perm tt if a packet introduced at input 
i comes out at output for all i (Fig. [2jb)). The following properties are 

cither easy to verify or well-known [23j. 

• An r-Benes network has r2 r — 2 r ~ 1 switches, and every path from an input 
to an output passes through 2r — 1 switches; 

• For every perm tt on [2 r ] there is a setting of the switches of an r-Bcncs 
network that realises tt. 




Figure 2: The Benes network construction and an example 

Clearly, Benes networks may be used to represent perms. If n = 2 r , a repre- 
sentation of a perm tt on [n] may be obtained by configuring an r-Benes network 
to realize tt and then listing the settings of the switches in some canonical order 
(e.g. level-order). This represents tt using r2 r — 2 r ~ 1 = nlgn — n/2 bits. Given 
i, one can trace the path taken by a packet at input i by inspecting the appro- 
priate bits in this representation, and thereby compute 7r(z); by tracing the path 
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back from output i we can likewise compute tt The time taken is clearly 

O(lgn); indeed, the algorithm only makes O(lgn) bit-probes. To summarize: 

Proposition 3.1. When n — 2 r for some integer r > 0, there is a representa- 
tion of an arbitrary perm tt on [n] that uses nlgn — n/2 bits and supports the 
operations 7r() and 7r _1 () in O(lgn) time. 

However, the Benes network has two shortcomings from our viewpoint: 
firstly, the Benes network is defined only for values of n that are powers of 
2. In order to represent a perm with n not a power of 2, rounding up n to 
the next higher power of 2 could double the space usage, which is unaccept- 
able. Furthermore, even for n a power of 2, representing a perm using a Benes 
network uses V(n) + f2(n) bits. 

We now define a family of Benes-like networks that admit greater flexibility 
in the number of inputs, namely the (q, r)-Bcnes networks, for integers r > 
0,q>l. 

Definition 3.1. A q-permuter to be a communication network that has q inputs 
and q outputs, and realises any of the q\ perms of its inputs (an r -Benes network 
is a 2 r -permuter) . 

Definition 3.2. A (q,r)-Benes network is a q-permuter for r = 0, and for 

r > it is composed of q2 r switches and two (q,r — l)-Benes networks, connected 
together in exactly the same way as a standard Benes network. 

Lemma 3.1. Let q > l,r > be integers and take p = q2 r . Then: 

1. A (q,r)-Benes network consists of q2 r ~ 1 (2r— 1) switches and2 r q-permuters; 

2. For every perm tt on [p] there is a setting of the switches of a (q, r)-Benes 
network that realises tt. 

Proof. (1) is obvious; (2) can be proved in the same way as for a standard Benes 
network. □ 

We now consider representations based on (q, r)-Benes networks; a crucial 
component is the representation of the central g-permuters, which we address 
in the next subsection. Since we are not interested in designing communication 
networks as such, we focus instead on ways to represent the perms represented by 
the central g-permuters in optimal (or very close to optimal) space and operate 
on it - specifically, to compute 7r() and 7r _1 () on the perms represented by the 
g-permuters - in the bit-probe, cell-probe or RAM model. This is sufficient to 
compute 7r() and tt^ 1 in the (q,r) Benes network at large. 

3.2.2. Representing Small Perms 

In this section we consider the highly space-efficient representation of "small" 
perms to use as a central g-permuter in a (q, r)-Benes network. It is straight- 
forward (as noted in Section [3~3| to represent a perm on [q], q = 0(lgn/lglgn) 
and operate on it in the cell-probe model, or by table lookup in the RAM model. 
As we will see, the larger we can make our central g-permuters (while keeping 
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optimal space and reasonable processing times), the lower the redundancy of 
our representation. With this in mind, we now give a method for asymptot- 
ically larger values of q. We use the following complexity bounds for integer 
multiplication and division using the fast Fourier Transform Q: 

Lemma 3.2. Given a number A occupying m words and another number B < 
A, one can compute the numbers {A mod B) and (A div B) in O(mlgm) time. 

Lemma 3.3. If q < (lgn) 2 /(lglgn) , then there is a representation of an arbi- 
trary perm ir on [q] using V(q) bits that supports and 7r _1 (i) in 0(lg n/ lg lg n) 
time. This assumes access to a set of precomputed constants that depend on q 
and can be stored in 0(q 2 lg q) bits and also precomputed tables of size \/n(lg n)°^ 
bits. 

Proof. We represent a perm 7r over [q] as a sequence r(0), r(l), . . . , r(q — 1), 
where r(0) = and for 1 < i < q, r(i) = \{j < < 7r(i)}| is the rank of 

in the set {7r(0), 7r(l), . . . , ir(i— 1)}. This sequence is viewed as a g-digit number 
in a "mixed-radix" system, where the i-th digit r{i) is from [i+ 1], representing 
the integer R = Xh=o The perm 7r is encoded by storing R in binary: 

since R is an integer from [q\], the space used by the encoding is V{q) bits, and 
R is stored in to = 0(lgn/(lglgn) 3 ) words. To compute 7r() or 7r _1 (), we first 
decode the sequence r(0), . . . , r{q — 1) from R in 0(TO,(lgTO.) 2 ) time, and from 
this seqeunce compute 7r() and 7r -1 () in O(TOlgm) and 0{m) time respectively, 
for an overall running time of 0(m(lgm) 2 ) = 0(lgn/lglgn). We now describe 
these steps, assuming for simplicity that q is a power of 2. 

To decode R, we first obtain representations R' and R" of the sequences of 
digits r(q-l),r(q-2), . . . ,r(q/2), and r(q/2-l), . . . ,r(0) as R' = (R div (g/2)!) 
and R" = (R mod (q/2)\) in O(mlgTO) time, and recurse. When recursing, 
note that lgi?' — (lgi?)/2 = O(q) bits, so the lengths of R' and R" are equal to 
within 0(TO/lgm) words. Standard arithmetic, plus table lookup, is used once 
the integer to be decoded fits into a single word. Thus, the recurrence is: 

T(m) = mlgm + T(toi) + T(to2) 
T(l) - 0(1) 

where toi + to-2 < to + 1 and \rrij — to/2| = 0(TO,/lgTO,) (for j — 1, 2), which 
clearly solves to 0(m(lgm) 2 ). (It is assumed that the divisors at each level of 
the recursion such as (<?/2)! at the top level, (g/4)! and (3g/4) (3(7/4— 1) • • • (q/2) 
at the next level etc. are pre-computed (but these depend on q only, and are 
independent of the perm ir). 

We partition the sequence r(q — 1), . . . , r(0) into chunks of c = |~^(lg n/ lgq)] 
consecutive numbers each; each chunk fits into a single word and the number of 
chunks is 0(m). Define under(a:, i) as the number of values in ir{q — 1), ... , tt{i) 
that are < x. As r(q — 1) = n(q — 1), under(a;, q — 1) is immediate. Further 
observe that: 

• if r(i) = x — under(a;, i + 1) — 1 then ir(i) = x; 
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• if r(i) < x — under(:r, i + 1) — 1 then < x; 

• if r(i) > x — under(x. i + 1) — 1 then ir(i) > x. 

Thus, underlie, i) is easily computed from under(x, i+1) and r(i). Given under(x, i) 
and a chunk r(i — 1), ... , r(i — c) one can perform all the following tasks in O(l) 
time using table lookup: 

• compute under(x,i — c); 

• determine if there is a j € [« — 1, i — c] such that n(j) = x; 

• given a position j £ [i — l,i — c], determine whether 7r(j) < i or > i. 

This gives an 0(m)-time algorithm for computing 7r -1 () and an (9(m lg m)-time 
algorithm for computing ttQ (via binary search). □ 

5.^.5. Representing Larger Perms 

We will now use the representation of Lemma 13.31 to represent larger per- 
mutations via the Benes network. We begin by showing: 

Proposition 3.2. For all integers p, t > 0, p > t there is an integer p' > p 
such that p' = q2 e and p' < p(l + 1/t), for integers q and £ where t < q < 2t 
and £>0. 

Proof. Take q to be \p/2 l ~\ , where I is the integer that satisfies t < p/2 e < 2t. 
Note that p' < {p/2 e + 1) • 2 r =p(l + 2 r /p) < p(l + 1/t). □ 

Now we describe the necessary modifications to the Benes network. Although 
no new ideas are needed, a little care is needed to minimize redundancy. 

Lemma 3.4. For any integer p < n, if p = q2 r for integers q and r such that 
(lgn) 2 /2(lglg?i) 4 < q < (lgn) 2 /(lglgn) 4 and r > 0, then there is a represen- 
tation of an arbitrary perm 7r on [p] that uses V(p) + @((p\gq)/q) bits, and 
supports ir () andir~ 1 () in 0(r + lg n/ lg lg n) time each. This assumes access to 
a pre-computed table of size 0(y/n(\g n) c ) bits that does not depend upon tt, for 
some constant c > 0. 

Proof. Consider the {q, r)-Benes network that realizes the perm 7r, and represent 
this network as follows. List all the switch settings of the outer 2r layers of 
switches as in Proposition 13.11 and represent each of the central q-permuters 
using Lemma 13.31 The representation of Lemma 13.31 requires pre-computed 
tables of size 0(y/n(\gn) a ) bits (for some constant c > 0), which can be shared 
over all the applications of the lemma. We now calculate the space used. Note 
that: 

V(p) = plg(p/e) + e(lgp) = q2 r (r + lgq-lge) + e(lgp) 
= qr2 r +2 r ( q \g( q /e)) + e(lgp) 
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By Lemma 13.11 and Lemma 13.31 the space used by the above representation 
(excluding lookup tables) is qr2 r + 2 r V{q) = qr2 r + 2 r (qlg(q/e) + 6(lg<j)) = 
V(p) + e((plgq)/q). 

The running time for the queries follows from the fact that we need to look at 
0(r) bits among the outer layers of switch settings, and that the representation 
of the central g-permuter (Lemma |3.3[) supports the queries in <3(lgn/lglgn) 
time. □ 

Theorem 3.2. An arbitrary perm tt on [n] may be represented using V(n) + 
0(rt(lglgn) 5 /(lgn) 2 ) bits, such that ir() and tt^ 1 () can both be computed in 
0(lgrt/lglg?i) time. 

Proof. Let t = (lgn) 3 . We first consider representing a perm ip on [I] for some 
integer I, t < I < 2t. To do this, we find an integer p = /(l+0((lglgn) 4 /(lgn) 2 )) 
that satisfies the preconditions of Lemma l3.4l such ap exists by Proposition ^. 21 
An elementary calculation shows that V(p) = V(l)(l + 0((lg lg n) 4 /(lg n) 2 )) — 
V{1) + 0(lg n(lg lg n) 5 ). We extend ip to a perm on [p] by setting ip(i) = i for 
all I < i < p and represent ip. By Lemma 13. 4| ip can be represented using 
V(p) + 9((plgp)(lglgn) 4 /(lgn) 2 ) = V(l) + 8(lgn(lglgn) 5 ) bits such that ip() 
and operations are supported in 0(lgn/lglgn) time, assuming access to 

a pre-computed table of size 0(y/n(\gn) c ) bits, for some constant c > 0. 

Now we represent ir as follows. We choose an n' > n such that n' = n(l + 
0(l/(lgn) 3 )) and n 1 = q2 r for some integers q, r such that t < q < 2t. Again 
we extend tt to a perm on [n'\ by setting ir(i) = i for n < i < n' ', and represent 
this extended perm. As in Lemma 13.41 we start with a (q, r)-Benes network 
that realises 7r and write down the switch settings of the 2r outer levels in level- 
order. The perms realised by the central g-permuters are represented using 
Lemma l3.4l Ignoring any pre-computed tables, the space requirement is qr2 r + 
2 r {T'(q) + 9(lgn(lglgn) 5 )) bits, which is again easily shown to be V(n') + 
Q{{n'\gn')/q + 2 r \gn{\g\gnf)) = V(n') + Q(n(\glgn) 5 /(\gn) 2 ) bits. Finally, 
as above, V(n') = (1 + 0(l/(lgn) 3 ))7 : '(n), and the space requirement is P(n) + 
6(n(lglgn) 5 /(lgn) 2 ) bits. 

The running time for 7r() and 7r _1 () is clearly O(lgn). To improve this 
to 0(lgn/lglgn), we now explain how to step through multiple levels of a 
Benes network in O(l) time, taking care not to increase the space consumption 
significantly. Consider a (q, r)-Benes network and let t = [lg lg n — lg lg lg n\ — 1. 
Consider the case when t < r (the other case is easier), and consider input 
number to the (g, r)-Benes network. Depending upon the settings of the 
switches, a packet entering at input may reach any of 2* switches in t steps 
A little thought shows that the only packets that could appear at the inputs to 
these 2* switches are the 2' +1 packets that enter at inputs 0, 1, k, k + 1, 2k, 2k + 
1, . . ., where k = q2 r ~ t . The settings of the t2 l switches that could be seen by 
any one of these packets suffice to determine the next t steps of all of these 
packets. Hence, when writing down the settings of the switches of the Benes 
network in the representation of 7r, we write all the settings of these switches 
in £2* < (lgn)/2 consecutive locations. Using table lookup, we can then step 
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through t of the outer 2r layers of the (q, r)-Benes network in O(l) time. Since 
computing the effect of the central g-permuter takes 0(lg nj lg lg n) time, we see 
that the overall running time is 0(r/t + lgn/lglgn) = 0(lgn/lglgn). □ 

3.3. Optimality 

We now consider the optimality of the solutions given in the previous two 
sections: specifically, if they achieve the best possible redundancy for a given 
query time. As noted in Introduction, Golynski (l5l Theorem 17] has shown 
that any data structure in the "black-box" model that supports 7r _1 in at most 
t < n/2 evaluations of 7r() requires an index of size f2((n/i) \g(n/t)). This shows 
the asymptotic optimality of Theorem 13.11 for t = n 1 ^^ 1 '. In the cell probe 
model, Golynski [14j shows that: 

Lemma 3.5. For any data structure which uses V(n) + r bits of space to repre- 
sent a perm over [n] and supports tt() and 7r _1 () in time tt and ti respectively, 
such thatma.x{tf,ti} < (l/16)(lgn/ lglgn), it holds that r — Sl((nlgn)/ (t fU)) 
bits. 

This shows that Corollarv l3.1l is optimal for a range of values of the parameter 
t. Specficially, there is a constant c (which depends upon the constant within the 
OQ in Corollary [3J] and the value 1/16 in Lemma l3~5|) such that the redundancy 
of Corollary 13. II is asymptotically optimal for all t < c lgn/lglgn. In order to 
clarify the relationship of Lemma 13.51 to the results in Section 13.21 we have the 
following proposition: 

Proposition 3.3. In the cell probe model with word size O(logn), a perm tt 
non [n] can be represented as follows: 

i. Both 7r() and 7r _1 () can be computed using 21gn/lglgn + 0(1) probes, 
and the space used is V(n) + 0(n(lglgn) 2 /lgn) bits. 

ii. Bothir{) andir~ 1 () can be computed using (2+ e)lgn/ dglgn+ O(l) probes, 
for any constant e > 0, and the space used is V(n) + 0(n(lg lg n) 3 /(lgn) 2 ) 
bits. 

Proof. In the cell probe model, we note that given a perm tt on [q], one can 
compute 7r() and tt^ 1 on a perm q in 0(1 + (q lg q)/ lg n) time, using V{q) 
bits. This is done by representing tt implicitly, e.g., as the index of tt in a 
canonical enumeration of all perms on [q], and computing ttQ and tt" 1 by simply 
reading the entire representation (which occupies 0(1 + (q lg q)/ lg n) cells) . Two 
particular values of q are of interest here: q\ — 0(lgn/ lglgn), when the time is 
O(l) probes, and 92 = e (lg tt,/ lg lg tt.) 2 , for some constant e < 1, when the time 
is at most e lgn/lglgn probes. 

Using these representations as the central g-permuter in Lemma l3.4| followed 
by Theorem 13. 2\ we note that the number of probes made in the outer layers 
of the Benes network is at most 2 lgn/lglgn. By adding the probes made to 
the central g-permuter (for both q — q\ and q = (72), we get the numbers of 
probes claimed. The redundancies are obtained by straightforward calculation 
as in Lemma 13.41 and Theorem 13.21 □ 
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The first of two cases represents the lowest number of probes that we are 
able to achieve with our approach. Although the number of probes is still higher 
than the maximum number of probes allowed by Lemma 13.51 the redundancy 
equals the lowest redundancy provable by Lemma 13.51 However, with a very 
small increase in the number of probes, the redundancy drops considerably (and 
in fact is lower than that of Theorem 13.21) . 

3.4- Supporting Arbitrary Powers 

We now consider the problem of representing an arbitrary perm tt to compute 
7r fc () for k > 1 (or k < 1) more efficiently than by repeated application of 7r() 
(or tt ()). Here we develop a succinct structure to support all powers of tt 
(including 7r() and tt^ 1 ). The results in this section assume that we have V{n) 
bits (plus some redundancy) to store the representation, i.e., we do not work in 
the "black-box" model. 

Theorem 3.3. Suppose there is a representation R taking s(n) bits to store 
an arbitrary perm tt on [n], that supports ir() in time tf, and tt~ 1 () in time 
ti. Then there is a representation for an arbitrary perm on [n] taking s(n) + 
0(nlgn/lglgn) bits in which TT k () for any integer \k\ < n can be supported in 
tf +ti + 0(1) time, and one taking s(n) + 0(^Jn\gn) bits in which TT k () can be 
supported in tf + U + O(lglgn) time. 

Proof. Consider the cycle representation of the given perm tt, in which for all 
cycles of tt, we write down the elements comprising the cycle, in the order in 
which they appear in the cycle, starting with the smallest element in the cycle. It 
will be convenient to consider the logical array ip of length n, which comprises the 
cycles written in nondecreasing order of length, with logical separators marking 
the boundary of each cycle (see Fig. |3] for an example j3. Clearly, ignoring the 
logical separators between cycles, ip is itself a permutation. 

To compute rr k (x) for any (positive or negative) k we do the following: 

1. find the position j in ip that contains x, 

2. find the left endpoint I of the segment of ip that represents the cycle 
containing i, and the length A of this cycle and 

3. return the element of ip in position s = I + ((j — I + k) mod A). 

The data structure for implementing this is as follows. We represent ip in 
the assumed representation R. In Step (1), j is computed as ip^li) in time ti, 
and in Step (3), the return value is just ip(s), computed in time tf. We now 
focus on Step (2). Let Ai < A2 < . . . < A 2 be the distinct cycle lengths in tt 
(the example in Fig. [3]has z = 3); note that z — 0(y/n). We store the sequence 
{Aj} in an array, using 0(y/nlgn) bits. Also consider the set S = {sj}, where 
s\ = and for i = 2, . . . , z, Si is the total length of all cycles in tt whose length 
is strictly less than Ai (note that Sj is the starting position of the sequence of 



2 One can dispense with the logical separators by writing the cycles in order of decreasing 
minimum element, but this is not as convenient for our purposes. 
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Figure 3: A permutation 7r and the logical array tp representing its cycles. 



cycles of size Ai). Thus, if j is the position of x in ip in Step (1), then the length 
A of the cycle containing x is At, where t — rank(j, S). Also, since all the cycles 
of length A begin at s t = select(5, t), it is straightforward to compute the left 
endpoint of the cycle containing x. It only remains to describe how to represent 
S. We choose two options, giving the claimed results: 

• to represent S in the FID of Theorem l2.1[ taking lg (™) +0(n lg lg n/ lg n) = 
0{n lglgn/lgn) bits, which supports rank and select in O(l) time. 

• to represent S as an array, supporting select in O(l) time and also as a 
predecessor data structure (e.g. the Y-fast trie |34{) which supports rank 
in O(loglogri) time. The space used by this option is 0(y/nlgn) bits. □ 

As an immediate corollary, we get, from Theorem 13.21 

Corollary 3.2. There is a representation to store an arbitrary perm n on [n] 
using at most V{n) + 0(n(lglgn) 5 /(lgri) 2 ) bits that can support 7r fc () for any k 
in 0(lgn/lglgn) time. 

4. Succinct trees with level-ancestor queries 

In this section we consider the problem of supporting level- ancestor queries 
on a static rooted ordered tree. The structure developed here will be used in 
the next section as a substructure in representing a function efficiently. Given 
a rooted tree T with n nodes, the level- ancestor problem is to preprocess T to 
answer queries of the following form: Given a vertex v and an integer i > 0, find 
the ith vertex on the path from v to the root, if it exists. Existing solutions take 
<d(nlgn) bits to answer queries in 0(1) time [1, [H, [H, [j| , and our solution stores 
T using (essentially optimal) 2n bits of space, and uses auxiliary structures of 
o(n) bits to support level-ancestor queries in 0(1) time. Another useful feature 
of our solution (which we need in the function representation) is that it also 
supports finding the level-successor (or predecessor) of a node, i.e., the node to 
the right (left) of a given node on the same level, if it exists, in constant time. 
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A high-level view of our structure and the query algorithm is as follows: for 
any constant c > we construct a structure A, that given a node x and any 
(positive or negative) integer k, \k\ < lg c n, supports finding the ancestor (or 
the first successor in pre-order, if k < 0) of x whose depth is depth (x) + k (this 
structure is our main contribution). Applying the above with c = 2 (say), we 
also construct another structure, B, which supports level-ancestor queries on 
nodes whose depths are multiples of lg 2 n, and whose heights are at least lg 2 n. 
To support a level-ancestor query, structure A is first used to find the closest 
ancestor of the given node, whose depth is a multiple of lg 2 n and whose height 
is at least lg 2 n. Then structure B is used to find the ancestor which is the 
closest descendant of the required node and whose depth is a multiple of lg 2 n. 
Structure A is again used to find the required node from this node. The choice 
of different powers of lg n in the structures given below are somewhat arbitrary, 
and could be fine-tuned to slightly improve the lower-order term. 

The structure A consists of the tree T represented in 2n bits as a balanced 
parenthesis (BP) sequence as in 25|, by visiting the nodes of the tree in depth 
first order and writing an open parenthesis whenever a node is first visited, and a 
closing parenthesis when a node is visited after all its children have been visited. 
Thus, each node has exactly one open and one closing parenthesis corresponding 
to it. Hereafter, we also refer a node by the position of either the open or the 
closing parenthesis corresponding to it in the BP sequence of the tree. We store 
an existing auxiliary structure of size o(n) bits that answers the following queries 



in O(l) time on the BP sequence (see [25|, Lll|] for details): 



• close(z): find the position of the closing parenthesis that matches the open 
parenthesis at position i. 

• open(i): find the position of the open parenthesis that matches the closing 
parenthesis at position i. 

• excess (i): find the difference between the number of open parentheses and 
the number of closing parentheses from the beginning up to the position i. 

Note that the excess of a position i is simply the depth of the node i in the tree. 
Our new contribution is to give a o(n)-bit structure to support the following 
operation in 0(1) time: 

• next-excess(z, k): find the least position j > i such that excess(j) = k. 

We only support this query for excess(i) — 0(lg c n) < k < excess(i) + 0(lg c n) 
for some fixed constant c. In the following lemma, we fix the value of c to be 2. 
Observe that next-excess(i, k) gives: 

(a) the ancestor of i at depth k, if k < depth (j), and 

(b) the next node after i in the level-order traversal of the tree, if k = depth (i), 
and 

(c) the next node after i in pre-order, if k > depth (i). 



17 



We now describe the auxiliary structure to support the next-excess query in 
constant time using o(n) bits of extra space, showing the following: 

Theorem 4.1. Given a balanced parenthesis sequence of length In, one can sup- 
port the operations open, close, excess and next-excess(i, k) where \k— excess(i)| < 
lg 2 n, all in constant time using an additional index of size o(n) bits. 

Proof. The auxiliary structure to support open, close and excess in constant 
time using o(n) additional bits has been described by Munro and Raman [25| 
(see also [llj for a simpler structure). We now describe the auxiliary structures 
required to support the next-excess query in constant time. 

We split the parenthesis sequence corresponding to the tree into superblocks 
of size s = lg 4 n and each superblock into blocks of size b — (lgn)/2. Since the 
excess values of two consecutive positions differ only by one, the set contain- 
ing the excess values of all the positions in a superblock/block forms a single 
range of integers, which we denote as the excess-range of the superblock/block. 
We store this excess range information for each superblock, which requires 
0(n lgn/lg n) = o(n) bits for the entire sequence. For each block, we also 
store the excess-range information, where excess is defined with respect to the 
beginning of the superblock. As the excess-range for each block can be stored 
using 0(lg lg n) bits, the space used over all the blocks is 0(n lg lg n/ lg n) = o(n) 
bits. 

For each superblock, we store the following structure to support the queries 
within the superblock (i.e., if the answer lies in the same superblock as the query 
element) in O(l) time: 

We build a complete tree with branching factor \J\gn (and hence constant 
height) with blocks at the leaves. Each internal node of this tree stores the 
excess ranges of all its children, where the excess-range of an internal node 
is defined as the union of the excess-ranges of all the leaves in its subtree. 
Thus, the size of this structure for each superblock is 0(s lglgn/6) = o(s) bits. 
Using this structure, given any position i in the superblock and a number k, 
we can find the position next-excess(z, k) in constant time, if it exists within 
the superblock. More specifically, a query is answered by starting at the leaf 
(block) v containing the position i, traversing the tree upwards till we find the 
first ancestor node which has a child with preorder number larger than that of v 
whose excess-range contains k, and then traversing downwards to reach the leaf 
containing the answer to the query; searches at the internal nodes and leaves are 
performed using precomputed tables, as the information stored at these nodes 
is either (9(^/lgnlglgn) bits for internal nodes, or (lgn)/2 bits for leaves. 

Let [ei, e2\ be the range of excess values in a superblock B. Then for each i 
such that ei — lg 2 n < i < ei or e2 < i < e2 + lg 2 n, we store the least position 
to the right of superblock B whose excess is i, in an array Ab- 

In addition, for each i, e\ < i < e2, we store a pointer to the first superblock 
B 1 to the right of superblock B such that B 1 has a position with excess i. 
Then we remove all multiple pointers (thus each pointer corresponds to a range 
of excesses instead of just one excess). The graph representing these pointers 
between superblocks is planar. [One way to see this is to draw the graph on 
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the Euclidean plane so that the vertex corresponding to the j-th supcrblock 
B, with excess values in the range [ei, e?], is represented as a vertical line with 
end points (J, e{) and (j,e2)- Then, there is an edge between two superblocks 
B and B' if and only if the vertices (vertical lines) corresponding to these are 
'visible' to each other (i.e., a horizontal line connecting these two vertical lines 
at some height does not intersect any other vertical lines in the middle).] Since 
the number of edges in a planar graph on m vertices is O(ra), the number of 
these inter-superblock pointers (edges) is 0{n/s) as there are n/s superblocks 
(vertices). The total space required to store all the pointers and the array Ab 
is 0(n\g 3 (n/s)) = o{n) bits. 

Thus, each superblock has a set of pointers associated with a set of ranges of 
excess values. Given an excess value, we need to find the range containing that 
value in a given superblock (if the value belongs to the range of excess values 
in that superblock), to find the pointer associated with that range. For this 
purpose, we store the following auxiliary structure: If a superblock has more 
than lgn ranges associated with it (i.e., if the degree of the node corresponding 
to a superblock in the graph representing the inter-superblock pointers is more 
than lgn), then we store a bit vector for that superblock that has a 1 at the 
position where a range starts, and everywhere else. We also store an auxiliary 
structure to support rank queries on this bit vector in constant time. Since there 
are at most n/{s\gn) superblocks containing more than lgn ranges, the total 
space used for storing all these bit vectors together with the auxiliary structures 
is o(n) bits. If a superblock has at most lgn ranges associated with it, then we 
store the lengths of these ranges (from left to right) using the searchable partial 



sum structure of [31[, that supports predecessor queries in constant time. This 
requires o(s) bits for every such superblock, and hence o(n) bits overall. 

Given a query next-excess(i, k), let B be the superblock to which the position 
i belongs. We first check to see if the answer lies within the superblock B 
(using the prefix sums tree structure mentioned above), and if so, we output 
the position. Otherwise, let [ei,e2] be the range of excess values in B. If 
e± — lg 2 n < k < e± or e% < k < &i + lg n, then we can find the answer 
from the array Ab- Otherwise (when e\ < k < 62), we first find the pointer 
associated with the range containing k (using either the bit vector or the partial 
sum structure, associated with the superblock) and use this pointer to find the 
block containing the answer. Finding the answer, given the superblock in which 
it is contained, is done using the prefix sums tree structure stored for that 
superblock. 

Thus, using these structures, we can support next-excess(i, k) for any i and 
\k — excess(i)| < lg 2 n in constant time. □ 

By using the balanced parenthesis representation of the given tree and by 
storing the auxiliary structures of Theorem 14.11 we can support the following: 
given a node in the tree find its fc-th ancestor, for k < lg 2 n, and also the next 
node in the level-order traversal of the tree in constant time. To support general 
level ancestor queries, we do as follows. 

Firstly, we mark all nodes of the tree that are at a depth which is a multiple 
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of lg 2 n and whose height is at least lg 2 n (similar to [l[). There are 0(n/ lg 2 n) 
such nodes. We store all these marked nodes as a tree (preserving the ancestor 
relation among these nodes) and store a linear space (hence o(n)-bit) structure 
that supports level-ancestor queries in constant time [3|. Note that one level in 
this tree corresponds to exactly lg 2 n levels in the original tree. We also store 
the correspondence between the nodes in the original tree and those in the tree 
containing only the marked nodes. 

A query for level-ancestor(a;, k), the ancestor of x at height k from x (i.e., at 
depth depth(a;) — k), is answered as follows: If k < lg 2 n, we find the answer 
using a next-excess query. Otherwise, we first find the least ancestor of x which 
is marked using at most two next-excess queries (the first one to find the least 
ancestor whose depth is a multiple of lg 2 n, and the next one, if necessary, to 
find the marked ancestor whose height is at least lg 2 n) . From this we find the 
highest marked ancestor of x which is a descendant of the answer node, using the 
level-ancestor structure for the marked nodes. The required ancestor is found 
from this node using another next-excess query, if necessary. 

The query level-successor(x), which returns the successor of node x in the 
level order (i.e., the node to the right of x which is in the same level as x), can 
be supported in constant time using a next-excess (x, depth(x)) query. Since all 
the nodes in a subtree are together in the parenthesis representation, checking 
whether a node a; is a descendant of another node y can be done in constant time 
by comparing either the open or closing parenthesis position of x with the open 
and closing parenthesis positions of y. Hence the representation also supports 
the is-ancestor operation in constant time. 

Thus we have: 

Corollary 4.1. Given an unlabeled rooted tree with n nodes, there is a struc- 
ture that represents the tree using 2n + o(n) bits of space and supports parent, 
first-child, level-ancestor, level-successor and is-ancestor queries in 0(1) time. 

5. Representing functions 

We now consider the representation of functions / : [n] — > [n] . Given such a 
function /, we equate it to a digraph in which every node is of outdegree 1, and 
represent this graph space-efficiently. We then show how to compute arbitrary 
powers of the function by translating them into the navigational operations on 
the digraph. 

More specifically, given an arbitrary function / : [n] — > [n], consider the 
digraph Gf = (V, E) obtained from it, where V = [n] and E = {(i, j) : f(i) = j}. 
In general this digraph consists of a set of connected components where each 
component has a directed cycle with each vertex being the root of a (possibly 
single node) directed tree, with edges directed towards the root. See Figure0|a) 
for an example. We refer to each connected component as a gadget. 

The main idea of our representation is to store the structure of the graph 
G f as a tree Tf such that the forward and inverse queries can be translated into 
appropriate navigational operations on the tree. We store the bijection between 
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(1)0 » 




6(15) 11(16) 

(a) Graph representation of the function f(x) = (x 2 + 2x — 1) mod 19, for < x < 18. The 
vertex labels in the brackets correspond to the function g obtained by renaming the vertices 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

1 5 4 12 17 9 15 3 13 14 10 16 8 18 11 7 6 2 

(b) Perm defining the isomorphism between G j and G g 
( ( ) ) ( ) ((()())) ( ( ) ) ( ( ) ) ((()(()()))) ( ( ) ) 

1000 00 10000000 0000 0000 100000000000 1000 
1000 10 10000000 1000 1000 100000000000 1000 

(c) Parenthesis representation and the bit vectors indicating the starting positions of the gad- 
gets and the trees (auxiliary structures are not shown) 

Figure 4: Representing a function 

the nodes labels in Gf and the preorder numbers of the 'corresponding' nodes 
in Tf as a perm tt. To support the queries for powers of /, we need to find the 
node in Tf corresponding to a given label, perform the required navigational 
operations on the tree to find the answer node(s), and finally return the label(s) 
corresponding to the answer node(s). Hence we store the perm tt using one of 
the perm representations from Section[3]so that 7r() and tt~ 1 () can be supported 
efficiently. 

We define a gadget to be wide if its cycle length is larger than lg 1 ^ 3 n, and 
narrow otherwise. The size of a gadget or a tree is defined as the number of 
nodes in it. Before constructing the tree Tf, we first re-order the gadgets and 
the tree nodes within each gadget as follows: (i) We first order the gadgets so 
that all the narrow gadgets are before any of the wide gadgets, (ii) Wide gadgets 
are ordered arbitrarily among themselves, while narrow gadgets are ordered in 
the non-decresing order of their sizes, (iii) Within each group of narrow gadgets 
with the same size, we arrange them in the non-decreasing order of their cycle 
lengths (the cycle length of a gadget is the number of trees in the gadget), 
(iv) For each gadget whose cycle length is greater than 1, we break the cycle 
by selecting a tree with maximal height among all the tree that belong to the 
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gadget and deleting the outgoing edge from the root of this tree. We then order 
the trees such that the trees are in the reverse order as we move along the cycle 
edges in the forward direction (thus the tree with the maximal height that was 
selected, is the last tree in this order), (v) We also arrange the nodes within 
each tree such that the leftmost path of any subtree is the longest path in that 
subtree, breaking the ties arbitrarily. 

We now construct a tree that encodes the structure of the function /. Let 
C\, C2, ■ ■ • , C p be the gadgets in Gf and let T} ,Tf , . . . , Tf* be the trees in the 
z-th gadget, for 1 < i < p, after the re-ordering of the gadgets and the nodes the 
within the trees. Let r\ be the root of the tree T/, for 1 < i <p and 1 < j < <7i- 
We refer the node r\ as the root of the gadget d . 

Construct a tree Tf with root r whose children are the p nodes: r\, r|, • • • rh 
For 1 < i < p, under the node r\ add the path r\ — rf — . . . — rf . Also attach 
the subtree under the root r\ in T- to the node in Tf. The size of Tf is n + 1 
(the n nodes in Gf plus the new root r). We represent the tree Tf using the 
structure of Corollary 14 . 1 1 using 2n + o{n) bits. Items (iv) and (v) above ensure 
that the leftmost path in any subtree of Tf is a longest path in that subtree, 
and hence is represented by a sequence of open parentheses in the BP sequence. 
This enables us to find the descendent of any node in the subtree at a given 
level, if it exists, in constant time. 

We number of the nodes of Tf with their pre-order numbers, starting from 
for the root r. Every node in the tree Tf, except for the root r, corresponds to a 
unique node in the graph G / , and this correspondence can be easily determined 
from the construction of the tree. As mentioned earlier, we store this bijection 
7r between the labels in Gf and the preorder numbers in Tf by representing the 
perm 7r that supports 7r() and 7T _1 () efficiently. 

In addition to the perm ir and the tree Tf, we store the following data 
structures using o(n) bits: 

1 . An array A storing the distinct sizes of the narrow gadgets in the increasing 
order (i.e., the sequence Si, S2, ■ ■ ■ , Sd> where 1 < Si < s 2 < . ■ . < s<j < n, 
and for 1 < i < d there exists a narrow gadget of size Si in Gf). Note 
than d — 0(* s fn). 

2. An FID for the set B = {pi,P2, ■ ■ -Pd}, where pi is the preorder number 
of the first narrow gadget (in the above ordering) whose size is Si (or 
equivalently, the sum of the sizes of all the narrow gadgets in G/ whose 
sizes are less than si), for 1 < i < d. 

3. An FID for the multiset C = {s id }, for 1 < i < d and 1 < j < n 1 / 3 , 
where Sij is the sum of the sizes of all the gadgets whose sizes are: (i) less 
than Si, and (ii) equal to Si whose cycle lengths are at most j. (A rank 
operation in this FID enables us to find the cycle length of the gadget 
containing the node with a given preorder number, if it is in a narrow 
gadget). 

4. An array A' that stores the size and cycle length of each wide gadget, in 
the above ordering of the wide gadgets. 
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5. An FID for the set B' = {p[,P2, ■ ■ -p'd'}: where d! is the number of wide 
gadgets in Gf , and p • is the preorder number of the root of the i-th wide 
gadget (in the above ordering). 

Given a node in a tree, we can find its fc-th successor (i.e., the node reached 
by traversing fc edges in the forward direction), if it exists within the same tree, 
in constant time using a level-ancestor query. The fc-th successor of node r\ (the 
root of the jth tree in the ith gadget) can be found in 0(1) time by computing 
the length of the cycle in the ith gadget, using rank and select operations on the 
the above FIDs. By combining these two, we can find the fc-th successor of an 
arbitrary node in a gadget in constant time. 

Given a node x in a gadget, if it is not the root of any tree, then we can 
find all its fc-th predecessors (i.e., all the nodes reachable by traversing k edges 
in the reverse direction) in optimal time using the tree structure by finding all 
the descendant nodes of x that are k levels below, as follows: we first find the 
leftmost descendant in the subtree rooted at x at the given level, if it exists, 
in constant time, as the leftmost path is represented by a sequence of open 
parentheses in the parenthesis representation of the tree. From this node, we 
can find all the nodes at this level by using the level-successor operation to find 
the next node at this level, checking whether the node is a descendant of x using 
the is-ancestor operation, and stopping when this test fails. 

To report the set of all fc-th predecessors of a node rj (which is the root of 
the jth tree in the ith gadget), if j + k < qi, then we report all the nodes in the 
subtree (of Tf) rooted at r\ that arc at the same level as r^ +fc . Otherwise, we 
first find all trees TJ' which contain at least one answer, and then report all the 
answers in each of those trees. 

Now to find all the trees T? that contain at least one answer, we observe 
that if T- contains at least one node that is a fc-th predecessor of , then it also 
contains at least one node that is a (qi + (k mod <;i))-th predecessor of r\ (here qi 
is the number of trees in the ith gadget). Also, the set of all (qi + (fc mod <?i))-th 
predecessors of r\ is a subset of the set of fc-th predecessors of r\, when fc > qi. 
In other words, the set of all trees that contain at least one fc-th predecessor of 
r\ is the same as the set of all trees that contain at least one (qi + (k mod qi))-th 
predecessor of r\ . 

Thus to find the fc-th predecessors of rj , we identify two subsets of trees 
whose union is the set of all trees in the gadget d that contain at least one 
answer. These two subsets are the set of all trees that contain at least one node 

• at a depth of k in the subtree rooted at node r\ in Tf, and 

• at a depth of k — (qi — j) in the subtree rooted at r\ in Tf. 

Once we identify all the trees containing at least one answer, we can report all 
the answer nodes in the tree Tf in time linear in the number of such nodes, as 
explained earlier. Each of these node numbers are then transformed into their 
corresponding node numbers in Gf using the representation of it. 
Combining all these, we have: 
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Theorem 5.1. If there is a representation of a perm on [n] that takes P(n) 
space and supports forward in tf time and inverse in ti time, then there is a 
representation of a function f : [n] [n] that takes P(n) + 2n + o(n) bits of 
space and supports f k (i) in 0(tf + ti * \ f k {i)\) time (or in 0(ti + tf * \f k (i)\) 
time), for any integer k (which can be stored in 0(1) words) and for any i £ [n]. 

Using the succinct perm representation of Corollary 13. 11 we get: 

Corollary 5.1. There is a representation of a function f : [n] — > [n] that takes 
(1 + e)n lg n + O(l) bits of space for any fixed positive constant e, and supports 
f k (i) in 0(1 + \f k (i)\) time, for any integer k (which can be stored in 0(1) 
words and for any i G [n] . 

5.1. Functions with arbitrary ranges 

So far we considered functions whose domain and range are the same set 
[n]. We now consider functions / : [n] — > [m] whose domain and range are of 
different sizes, and deal with the two cases: (i) n > m and (ii) n < m separately. 
These results can be easily extended to the case when neither the domain nor 
the range is a subset of the other. We only consider the queries for positive 
powers. 

Case (i) n > m: A function / : [n] — > [m], where n > m can be represented 
by storing the restriction of / on [m] using the representation mentioned in the 
previous section, together with the sequence S = f(m + 1), f(m + 2), . . . , f(n) 
stored in an array. This gives a representation that supports forward queries 
efficiently. 

To support the inverse queries, we store the sequence S using a representa- 
tion that supports access and select queries efficiently, where access(i) returns 
the value f(m + i), and select(j, k) returns the fc-th occurrence of the value j in 
the sequence. We use the following representation which is implicit in Golyn- 
ski et al. [l(|: A sequence S of length n from an alphabet of size k (where n > k) 
can be represented as a collection of [n/fc] perms over [k] together with 0(n) 
bits such that a select or an access query on S can be answered by performing a 
single 7r() or ir^ 1 query on one of the perms, together with a constant amount 
of computation. 

In addition, we augment the directed graph G/, representing the function / 
restricted to [m], with dummy nodes as follows: if f(m + i) = j, then we add a 
dummy node v as a 'child' of the node corresponding to j in Gf. The node v 
is a representative of the set {i\f{i) — j,i > m}. We represent this augmented 
directed graph to support the forward and inverse queries, using 0(m) bits. 
We also represent the perm that maps the 'real' (non-dummy) nodes to their 
original values in the function /. Finally, we store an FID that indicates the 
positions of the dummy nodes in the order determined by the representation of 
Gf, using 0(m) bits (note that the size of the graph Gf is 0(m)). 

To answer a query f k (i) for i G [n] and k > 1, we first find the node v 
corresponding to i in the augmented graph Gf. The node v is a 'real' node if 
i < m, and can be found using the perm tt that maps the nodes of Gf to their 
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values in / and the FID indicating the positions of dummy nodes. We then 
find the node u that is reached by traversing k edges in the forward direction, 
using the structure of Gf. Finally, the value corresponding to the node u is 
obtained using the perm tt. If i > m, then the node v is a dummy node, and we 
can find j = f(i) using an access query on the string S, and use the fact that 
f k {i) = f k ~ 1 (j) to compute the answer. 

To answer a query f~ k (i) for i G [to] and k > 1, we first find the node 
corresponding to the value i in Gf, find all the nodes that can be reached by 
traversing k edges in the backward direction, and return the values correspond- 
ing to all such nodes. Thus we have: 

Theorem 5.2. If there is a representation of a perm on [n] that takes P(n) 
space and supports forward in tf time and inverse in ti time, then there is a 
representation of a function f : [n] — > [to] , n > m that takes (n — to) |~lg to] + 
P(m) + 0(m) bits of space and supports f k (i) in 0(tf+ti) time, for any positive 
integer k and for any i G [n]. There is another representation of f that takes 
\n/m \ P(m) + 0(m) bits that supports, for any k > 1, f k (i) in 0(tf + ti) time, 
and f~ k (i) in 0(tf + ti * \f~ k (i)\) time (or in 0(t l + tf * \f~ k {i)\) time). 

Case(ii) n < to: For a function / : [n] — > [to], where n < to, larger powers 
(i.e., f k (i) for k > 2) arc not defined in general (as we might go out of the 
domain after one or more applications of the function). 

Let R be the set of all elements in the range [to] that have pre-images in 
the domain [n] whose values are greater than n. In the graph Gf representing 
the function /, each element in R corresponds to the root of a tree with no 
outgoing edges. We order these trees such that elements corresponding to these 
roots are in the increasing order. We then store an indexable dictionary for the 
set R C [to] using lg (i 1 ^,) +o(|i?|) + 0(lglgm) bits . Since \R\ < n, this space is 
at most nlg(m/n) + 0(n + lg lg m) bits. The size of the graph Gf is 0(n) and 
hence is stored in 0(n) bits using the representation described in the previous 
section. Finally, we store the correspondence between the node numbering given 
by the 0(n)-bit representation and the actual node labels in Gf, except for the 
nodes corresponding to R. As all these nodes are in the set [n], we need to store 
a perm tt over [n]. 

A query for f k (i), for i G [n] and k > 1 is answered by first finding the node 
corresponding to i in G/ using tt, then finding the fc-th node in the forward 
direction, if it exists, using the structure of Gf, and finally finding the element 
corresponding to this node, using the representation of tt again. To find the set 
f~ k (i), for i G [to] and k > 1, we first find the node x corresponding to i in Gf 
using either the representation of tt if i < n, or using the indexable dictionary 
stored for the set R if n < i < m. We then find all the nodes reachable from 
x by taking k edges in the backward direction. We finally report the elements 
corresponding to each of these nodes, using the representation of tt. Thus we 
have: 

Theorem 5.3. If there is a representation of a perm on [n] that takes P(n) 
space and supports forward in tf time and inverse in ti time, then there is a 



25 



representation of a function f : [n] — > [m], n <m that takes nlg(m/n) + P(n) + 
0(n) bits. For any positive integer k, this representation supports the queries 
for f k (i), for any i € [n] (returns the power if defined and — 1 otherwise) in 
0(tf + ti) time, and supports f~ k (i), for any i £ [to] in 0(tf + U * \f~ k (i)\) 
time (or in 0(ti + tf * \f~ k (i)\) time). 
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